Near Optimal Dimensionality Reductions That Preserve Volumes

نویسندگان

  • Avner Magen
  • Anastasios Zouzias
چکیده

Let P be a set of n points in Euclidean space and let 0< ε< 1. A wellknown result of Johnson and Lindenstrauss states that there is a projection of P onto a subspace of dimension O(ε−2 logn) such that distances change by at most a factor of 1+ ε. We consider an extension of this result. Our goal is to find an analogous dimension reduction where not only pairs but all subsets of at most k points maintain their volume approximately. More precisely, we require that sets of size s ≤ k preserve their volumes within a factor of (1+ ε)s−1. We show that this can be achieved using O(max{ ε ,ε−2 logn}) dimensions. This in particular means that for k = O(logn/ε) we require no more dimensions (asymptotically) than the special case k = 2, handled by Johnson and Lindenstrauss. Our work improves on a result of Magen (that required as many as O(kε−2 logn) dimensions) and is tight up to a factor of O(1/ε). Another outcome of our work is an alternative and greatly simplified proof of the result of Magen showing that all distances between points and affine subspaces spanned by a small number of points are approximately preserved when projecting onto O(kε−2 logn) dimensions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

6.1 Dimensionality Reduction

Previously in the course, we have discussed algorithms suited for a large number of data points. This lecture discusses when the dimensionality of the data points becomes large. We denote the data set as x1, x2, . . . , xn ∈ RD for D >> n, and will consider dimensionality reductions f : RD → Rd for d << D. We would like the function f to preserve some properties of the original data set, such a...

متن کامل

An Intelligent Credit Forecasting System Using Supervised Nonlinear Dimensionality Reductions

Kernel classifiers (such as support vector machines) have been successfully applied in numerous areas, and have demonstrated excellent performance. However, due to the high dimensionality and nonlinear distribution of financial input data in credit rating forecasting, finding a suitable low dimensional subspace by nonlinear dimensionality reductions is a key step to improve classifier performan...

متن کامل

Euclidean Embeddings that Preserve Volumes

Let P be a set of n points in Euclidean space and let 0 < ε < 1. A well-known result of Johnson and Lindenstrauss states that there is a projection of P onto a subspace of dimension O(ε−2 logn) such that distances change by a factor of 1 + ε at most. We consider an extension of this result. Our goal is to find an analogous dimension reduction where not only pairs, but all subsets of at most k p...

متن کامل

A Scalable DBMS for Large Scientific Simulations

Scientiic simulations evolve rather fast. Both the logical organization of the underlying database and the scientist's view of data change rapidly. The underlying DBMS must provide appropriate support for the evolution of scientiic simulations, their rapidly increasing computational intensity, as well as the growing volumes and dimensionality of scientiic data. ADAMS is a dynamic and scalable a...

متن کامل

An approximate dynamic programming framework for modeling global climate policy under decision-dependent uncertainty

Analyses of global climate policy as a sequential decision under uncertainty have been severely restricted by dimensionality and computational burdens. Therefore, they have limited the number of decision stages, discrete actions, or number and type of uncertainties considered. In particular, other formulations have difficulty modeling endogenous or decision-dependent uncertainties, in which the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008